bigger picture
Seeing the Bigger Picture: 3D Latent Mapping for Mobile Manipulation Policy Learning
Kim, Sunghwan, Chung, Woojeh, Dai, Zhirui, Bhatt, Dwait, Shukla, Arth, Su, Hao, Tian, Yulun, Atanasov, Nikolay
In this paper, we demonstrate that mobile manipulation policies utilizing a 3D latent map achieve stronger spatial and temporal reasoning than policies relying solely on images. We introduce Seeing the Bigger Picture (SBP), an end-to-end policy learning approach that operates directly on a 3D map of latent features. In SBP, the map extends perception beyond the robot's current field of view and aggregates observations over long horizons. Our mapping approach incrementally fuses multiview observations into a grid of scene-specific latent features. A pre-trained, scene-agnostic decoder reconstructs target embeddings from these features and enables online optimization of the map features during task execution. A policy, trainable with behavior cloning or reinforcement learning, treats the latent map as a state variable and uses global context from the map obtained via a 3D feature aggregator. We evaluate SBP on scene-level mobile manipulation and sequential tabletop manipulation tasks. Our experiments demonstrate that SBP (i) reasons globally over the scene, (ii) leverages the map as long-horizon memory, and (iii) outperforms image-based policies in both in-distribution and novel scenes, e.g., improving the success rate by 25% for the sequential manipulation task.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- (2 more...)
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Vempati, Shashank, Anand, Nishit, Talebailkar, Gaurav, Garai, Arpan, Arora, Chetan
Conventional optical character recognition (OCR) techniques segmented each character and then recognized. This made them prone to error in character segmentation, and devoid of context to exploit language models. Advances in sequence to sequence translation in last decade led to modern techniques first detecting words and then inputting one word at a time to a model to directly output full words as sequence of characters. This allowed better utilization of language models and bypass error-prone character segmentation step. We observe that the above transition in style has moved the bottleneck in accuracy to word segmentation. Hence, in this paper, we propose a natural and logical progression from word level OCR to line-level OCR. The proposal allows to bypass errors in word detection, and provides larger sentence context for better utilization of language models. We show that the proposed technique not only improves the accuracy but also efficiency of OCR. Despite our thorough literature survey, we did not find any public dataset to train and benchmark such shift from word to line-level OCR. Hence, we also contribute a meticulously curated dataset of 251 English page images with line-level annotations. Our experimentation revealed a notable end-to-end accuracy improvement of 5.4%, underscoring the potential benefits of transitioning towards line-level OCR, especially for document images. We also report a 4 times improvement in efficiency compared to word-based pipelines. With continuous improvements in large language models, our methodology also holds potential to exploit such advances. Project Website: https://nishitanand.github.io/line-level-ocr-website
- Europe > Switzerland (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- (2 more...)
Modeling extremely large images with xT
As computer vision researchers, we believe that every pixel can tell a story. However, there seems to be a writer's block settling into the field when it comes to dealing with large images. Large images are no longer rare--the cameras we carry in our pockets and those orbiting our planet snap pictures so big and detailed that they stretch our current best models and hardware to their breaking points when handling them. Generally, we face a quadratic increase in memory usage as a function of image size. Today, we make one of two sub-optimal choices when handling large images: down-sampling or cropping.
Is AI threatening SEO strategy?
Artificial intelligence has taken over the SEO industry in recent months. With the emergence of AI-driven tools like ChatGPT, which can understand and perform all kinds of (SEO) tasks, an ever-lasting question is once again finding its way into lots of headlines: Is the SEO industry dying? First, if you're impressed with ChatGPT and think it may be threatening SEO, you may be surprised to know that according to the tool itself, the SEO industry is going to be just fine: AI may change the way SEO is done, but it is unlikely to replace the entire profession. While AI can automate certain aspects of SEO, such as keyword analysis and technical site audits, it still requires human expertise and creativity to develop and implement effective strategies. Additionally, search engines themselves are constantly evolving and becoming more sophisticated, making them a moving target for AI to keep up with.
Driving smarter customer experiences with AI and machine learning
We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Artificial intelligence (AI) is demonstrating its ability to stimulate growth in both digital and nondigital native businesses. According to Deloitte, businesses across sectors are using AI to create business value. From streamlining data analytics to improving customer experiences, AI offers several benefits for businesses. When AI is integrated into an organization's core product or service and business processes, it's at its most beneficial.
Is pharma AI on the brink of an investment boom?
The pharma industry is seeing an increase in artificial intelligence (AI) investment across several key metrics, according to an analysis of GlobalData data. AI is gaining an increasing presence across multiple sectors, with top companies completing more AI deals, hiring for more AI roles and mentioning it more frequently in company reports at the start of 2021. GlobalData's thematic approach to sector activity seeks to group key company information on hiring, deals, patents and more by topic to see which companies are best placed to weather the disruptions coming to their industries. These themes, of which AI is one, are best thought of as "any issue that keeps a CEO awake at night", and by tracking them it becomes possible to ascertain which companies are leading the way on specific issues and which are dragging their heels. According to this method, Novartis, GlaxoSmithKline, Merck & Co are classed as dominant players in AI in the sector, with an additional 19 companies classified as leaders.
When Should Health Systems Invest in New Tech?
In my long tenure as CIO at a large academic health system, I was often accosted by senior members of the medical, nursing, or administrative staff, just back from a meeting: "I saw a demo of this next-generation electronic-health-records system. It saves lives, reduces costs, improves the patient experience, and mows lawns! We need to implement it here! Or a board member would weigh in: "We need to be aggressively pursuing artificial intelligence! AI will be incredibly disruptive, possibly replacing most of our physicians.
The Artist in the Machine: The bigger picture of AI and creativity
Welcome to AI book reviews, a series of posts that explore the latest literature on artificial intelligence. Will machines ever be able to replace or replicate human creativity? That is a question that we repeatedly ask ourselves as we continue to innovate and invent new creative tools. The printing press, the gramophone, the camera, the camcorder, the typewriter, the synthesizer, word processors, photo editing software, and many other tools we have invented over the past centuries have brought fundamental changes to creativity and arts. But what has remained constant throughout history is the human element.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
What to Ask When Implementing Machine Learning
Successfully operationalizing machine learning models in production environments can be incredibly difficult, as the industry has already seen. In fact, Gartner has predicted 85 percent of AI projects in the next few years will fail to produce results. So how do you set your AI projects up for success? ML implementation requires a unique approach that is a complete shift from the traditional approach to IT projects. Data and analytics are the centerpieces of ML projects.
How data science can answer cybersecurity challenges - JAXenter
Data science and machine learning continue to improve and advance. One of the areas where it is becoming more relevant is data security – AI in cybersecurity is expected to reach almost $35 billion by 2025. Data scientists can apply their knowledge to the cybersecurity field to help protect against attacks and identify suspicious behavior. The fact that they play a versatile role of a technical expert, problem gatherer, analyst and a skilled interpreter, problem-solving is easiest for them. By using knowledge of data science, coders and programmers can also improve their techniques to create better programs to protect against cyber threats.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.88)